Search | WHO COVID-19 Research Database

Optimized speaker change detection approach for speaker segmentation towards speaker diarization based on deep learning

K, VijayKumar, R, Rajeswara Rao.

Data & Knowledge Engineering ; : 102121, 2022.

Article in English | ScienceDirect | ID: covidwho-2122412

ABSTRACT

Speaker diarization is the partitioning of an audio source stream into homogeneous segments according to the speaker’s identity. It can improve the readability of an automatic speech transcription by segmenting the audio stream into speaker turns and identifying the speaker’s true identity when used in combination with speaker recognition systems. Generally, the automatic speaker diarization is done based on two phases, like the transformation of audio segments into feature representation and the clustering. In this paper, clustering along with a hybrid optimization technique is carried out for performing the speaker diarization. For that, the extracted features from the audio signal is processed under speech activity prediction in order to identify the speak segments. The diarization process is done by Deep Embedded Clustering (DEC) in which the constants are trained by the developed Fractional Anticorona Whale Optimization Algorithm (FrACWOA). The FrACWOA is a hybrid optimization technique, which is designed by adapting the concept of fractional theory, precaution behaviour of COVID-19 and hunting performance of whales. DEC performs the diarization, which concurrently learns the representation of features as well as cluster assignments with neural networks. Using a mapping from the information space to a lower-dimensional feature space, DEC repeatedly discovers the most effective solution for a clustering objective. On the basis of testing accuracy, diarization error, false discovery rate (FDR), false negative rate (FNR), and false positive rate (FPR) of 0.902, 0.627, 0.276, 0.117, and 0.118, respectively, the developed FrACWOA+DEC algorithm performed much better with six speakers using the EenaduPrathidwani dataset. Comparing the accuracy of the proposed method to existing approaches such as Active learning, DE+K-means, LSTM, MCGAN, ANN-ABC-LA, and ACWOA+DFC, the accuracy of the proposed method is 12.97%, 10.31%, 9.75%, 7.53%, 4.32%, and 2.106% higher when using 6 speakers.

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL